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Statistical equilibrium in deterministic cellular 
automata 


Siamak Taati 


Abstract Some deterministic cellular automata have been observed to follow the 
pattern of the second law of thermodynamics: starting from a partially disordered 
state, the system evolves towards a state of equilibrium characterized by maxi¬ 
mal disorder. This chapter is an exposition of this phenomenon and of a statisti¬ 
cal scheme for its explanation. The formulation is in the same vein as Boltzmann’s 
ideas, but the simple combinatorial setup offers clarification and hope for generic 
mathematically rigorous results. Probabilities represent frequencies and subjective 
interpretations are avoided. 


1 Introduction 


The aim of statistical mechanics is to bridge between microscopic and macroscopic 
behaviour of systems consisting of a large number of interacting components. The 
prime example is a gas of particles moving and interacting according to the laws of 
mechanics, giving rise to macroscopic behaviour described in thermodynamics. The 
kinetic theory of gases, initiated by Clausius, Maxwell and Boltzmann, takes on the 
task of explaining the macroscopic behaviour of a gas on the basis of its microscopic 
description. 

The main problem in kinetic theory is the derivation of the second law of thermo¬ 
dynamics (i.e., the tendency of an isolated thermodynamic system to evolve towards 
more disordered states). Starting from a collection of particles pictured as hard balls 
interacting through elastic collisions and using a simplifying (though erroneous) 
statistical assumption about the number of collisions of each type occurring in a 
small time interval (the Stosszahlansatz), Boltzmann was able to derive a version of 
the second law by showing that a certain quantity measuring disorder (Boltzmann’s 
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entropy) increases monotonically with time and is maximized precisely when the 
system is in equilibrium. 

Although the second law of thermodynamics was originally formulated for ther¬ 
modynamic systems, its applicability goes beyond a system of particles following 
the particular laws of (classical or quantum) mechanics. A mathematical under¬ 
standing of the precise circumstances leading to the applicability of the more general 
law of tendency towards disorder is desirable but missing. 

The purpose of this chapter is to demonstrate examples of results and exper¬ 
imental observations regarding the so-called randomization behaviour in cellular 
automata (going back to Miyamoto, Wolfram and Lind) that could be thought of 
as instances of this generalized version of the second law of thermodynamics. No¬ 
tably, neither probabilistic hypotheses (i.e., incorporating intrinsic randomness in 
the model) nor subjective interpretations (see US) are needed — probabilities enter 
the picture only as intuitive means of representing statistical data. The combina¬ 
torial setting of cellular automata is simple enough that one could attempt to find 
generic mathematical conditions that guarantee the applicability of the second law. 
At the same time, the rich range of behaviour among cellular automata makes the 
challenge interesting and non-trivial. 

The scenario is briefly as follows. Consider a configuration that is atypical of 
the maximally disordered state (so that there is a bias in the frequency of the pat¬ 
terns) but is not too rigidly regular either (e.g., it is not periodic). Over the time, 
a sufficiently chaotic cellular automaton shuffles such a configuration (albeit de¬ 
terministically) in such a way that the bias gradually becomes undetectable. More 
specifically, the configuration of the system becomes more and more typical of the 
maximally disordered state, up to wider and wider ranges of observation. 


2 Randomization Phenomenon: Examples 
2.1 XOR cellular automaton 

On the space of all bi-infinite sequences of symbols 0 and 1, consider a transforma¬ 
tion T that maps a sequence x into another sequence Tx defined by {Tx)i = x, -|-x,+i 
(mod 2). In other words, T replaces each symbol with the sum (modulo 2) of that 
symbol and its right neighbour. The iteration of T defines a dynamical system 
on {0,1}^, which we refer to as the XOR cellular automaton^Each sequence in 
{0,1}^ will be called a configuration of the system. A sample trajectory of this 
system is depicted in Figure 

The map T is continuous with respect to the product topology. The product 
topology is the topology in which two configurations are considered “close” if 
they agree on a large region around the origin. Convergence in the product topol¬ 
ogy corresponds to site-wise eventual agreement. Another basic property of T is 

* XOR stands for “exclusive OR”. 
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time 


□□□□□□□□□□□□□■□□□ 

□□□□□□□□□□□□■■□□□ 

□□□□□□□□□□□■□■□□□ 

□□□□□□□□□□■■■■□□□ 


Fig. 1 A sample trajectory of the XOR cellular automaton starting with a configuration consisting 
of a single 1 at the origin and 0 everywhere else. 


its translation symmetries. Namely, if O’* denotes the translation by k (that is, 
{(J^x)i = Xk+i), then Ta^ = a^T for every k&'E. The map T is also additive, mean¬ 
ing T{x + y) = Tx + Ty, where the addition is performed site-wise and modulo 2. 
Although T is not invertible, it is onto and “almost one-to-one” in that every con¬ 
figuration y G {0,1}^ has precisely 2 pre-images. Namely, choosing a symbol xq 
arbitrarily, we can find, recursively, unique values for the symbols x,, for i > 0 and 
i < 0, such that Tx=y. 

Slightly less obvious is the following balance property of T; if we choose the 
symbols in x by independent unbiased coin flips, the symbols in Tx will also be 
indistinguishable from independent unbiased coin flips. In other words, the uniform 
Bernoulli measure is invariant under T. To see this, take any block of n consecutive 
symbols b\b2 ■ ■ - bn and consider the probability that Tx takes values bib2 ■■■bn at 
positions kXok + n—\. There are precisely two choices of values a 102 • • • a „+1 and 
a\ay ■ ■ 1 that, if on x at positions kto k + n, lead to the desired values bib 2 ^ ■ ■bn 
on Tx at sites ktok + n — 1. Each of these two choices has probability 2^^”+*) of 
appearing in independent flips of an unbiased coin. Therefore, bib 2 ^ ■ ■bn appears at 
positions kto k + n—I of Tx with probability 2^". 

Besides the balance property, the XOR cellular automaton has a wealth of other 
statistical regularities. For instance, if x is chosen according to a uniform Bernoulli 
measure (i.e., with independent unbiased coin flips), then for any n, the sequence 
of blocks (T^x)]^ ,(-+«)> t = k,k+l,... is independent of the block . It follows 

from the law of large numbers that, almost surely, every pattern 0102 • • •«„ appears 

with the same frequency 2 ^” in the space-time column ^ = 07172 ,_ 

The same sort of eventual independence holds along any other “space-time direc¬ 
tion”: for every a G and b G Z and a sufficiently large to, the tilted column of 
blocks (T^^xj^ht+zc^bt+k+n) with t > tots independent ofx]j^b+n)- 

Yet, the most remarkable property of the XOR cellular automaton for us is its 
randomization effect: if x is chosen using independent flips of a biased coin (say, 
with probability p G (0,1) of having 1 at each site), then T will gradually random- 
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ize X, meaning T^x will be asymptotically indistinguishable (in distribution) from a 
configuration chosen using independent flips of an unbiased coin as f —> oo, provided 
we ignore a negligible set of time steps t (Figure]^. 



Fig. 2 Randomization effect of the XOR cellular automaton. The starting configuration is chosen 
by independent biased coin flips with probability p = 0.1 of having 1 at each site. Ignoring a 
negligible set of time steps (represented by gray lines), the configuration quickly becomes almost 
uniform. 


To State this more precisely, we need some notation and terminology. A (Borel) 
probability measure on {0,1}^ is uniquely identified by the probabilities it asso¬ 
ciates to the cylinder sets 


\p-k^k-\-\ ■ ■ ■ t3/:+n] — G {O, 1} I XkX}^-\-\ ■ ■ ■ X}^-\-yi — ■ ■ ’ ak-\-n\ . 


For instance, for the Bernoulli measure with parameter p (the distribution of inde¬ 
pendent flips of a biased coin with probability p of having 1), which we will denote 
by Pp, we have 


for any block a = • • -at+n, where #±(a) and #o(a) denote, respectively, the 

number of Is and Os appearing in a. The image of a probability measure n under T 
is another probability measure TK with {Tn){E) = 7i{T^^E) for any measurable set 
E. This is the distribution of Tx if jc is chosen at random according to n. A sequence 
of probability measures Vi, V 2 ,... is said to converge weakly to a measure n if the 
probabilities that v, associate to each fixed cylinder converge to the probability of 
that cylinder according to n. 

By the above-mentioned balance property, Tpi /2 = Bi/i- Miyamoto 12^ and 
Lind II 24 II (following experimental observations made by Wolfram ll38l ) proved that 


Theorem 1. There is a set J C N of density 1 such that for every p € (0,1), T' Pp —>■ 
Pi 12 as t ^ ^ within J. 
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Here, the density of a set of non-negative integers J is defined as 


d{J) = lim 


l-zn [o,«)| 

n 


when the limit exists. The theorem in particular implies that the Ces^o averages 
I converge to as n oo. 

The randomization behaviour of the XOR cellular automaton can be seen as an 
analogue (or an instance) of the second law of thermodynamics: the system evolves 
towards an equilibrium in a macroscopic state with highest degree of disorder. Here, 
the term macroscopic is understood as synonymous with statistical: the macroscopic 
state of a conhguration x consists of the frequency of occurrence of every finite word 
a £ {0,1}* in x. This information is encapsulated conveniently in a translation- 
invariant probability measure Ux that is dehned by those frequencies and which has 
jt as a “typical element”. The equilibrium state (the uniform Bernoulli measure) is 
the least presumptive (most random) state: every word of length n has the same 
frequency 2^". In Sections[^and|^ we shall make this interpretation more precise. 

The starting conhguration does not need to be Bernoulli for the XOR cellular 
automaton to randomize it. A random conhguration which is a realization of a (bi- 
inhnite) k-step Markov chain with positive transition probabilities is also random¬ 
ized by the XOR cellular automaton. In other words, the conclusion of Theorem 
remains true if jip is replaced with a full-support Markov measure Q. More gener¬ 
ally, randomizaton is known to occur as long as the starting measure is harmonically 
mixing ||29l[3Ol0A complete characterization of the measures randomized by the 
XOR map is nevertheless missing. 


2.2 A reversible cellular automaton 

The analogy with the second law of thermo^namics would have been stronger 
if the XOR cellular automaton were reversibl^or symmetric under time reversal]^ 
Consider now a different cellular automaton acting on the conhgurations of symbols 
from 5 = {0,1} X {0,1}. Each site i £ Z of a conhguration {x,y) carries two symbols 
Xi and yi, and the cellular automaton mw T is dehned by {T{x,y))j = {yi,Xi +yj+i), 
where the addition is again modulo 2ljLet us call this the XOR-transpose cellu- 


^ For the definition of harmonic mixing and basic properties of the class of harmonically mixing 
measures, see 12911301 . The result of 1^ covers also the measures with complete connections and 
summable decay. These, however, turn out to be included in the class of harmonically mixing 
measures ca 

^ A cellular automaton is said to be reversible if it is bijective and has another cellular automaton 
as inverse. This is equivalent to bijectivity, because the configuration space is compact and metric. 
^ A reasonable definition of time-reversal symmetry for cellular automata is that T is reversible and 
there is another reversible cellular automaton R such that = R^^TR: see (§]. 

^ A reader familiar with Kac’s ring model (see flTl . Section III. 14) might recognize some similar¬ 
ity. The infinite version of Kac’s model can be defined with {T(x,y))^ = -|-y,+i). The first 
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lar automaton]^ As in the XOR example, the map T is continuous and translation- 
invariant 0 It is also additive and has the balance property: the uniform Bernoulli 
measure /r on is invariant under rj^ Unlike the XOR cellular automaton, the 
XOR-transpose is reversible and time-reversal symmetric: one can traverse back¬ 
wards in time by switching the two symbols at each sitej^Maass and Martinez ll25l 
proved that the XOR-transpose cellular automaton has a similar randomization 
property as the XOR cellular automaton (Figure]^: 

Theorem 2. Let % be the distribution of a single-step Markov chain on S with pos¬ 
itive transition probabilities. Then, ^YJt=o converges to the uniform Bernoulli 
measure p. as n ^ o°. 

The convergence of the Cesaro averages implies the existence of a set 7 C N of 
density 1 of time steps t along which T'k converges to p (see ifThl . Corollary 1.4), 
but the set J might, in principle, depend on the measure n. 



Fig. 3 Randomization effect of the reversible cellular automaton of Maass and Martinez. In the 
initial configuration, each component of the symbol at each site is chosen with an independent coin 
flip with probability 0.1 of having a 1. 


component represents the presence or absence of marks on each site and the second the colour of 
the balls. 

® The name is suggested by the fact that the space-time diagrams of this cellular automaton are 
obtained from the space-time diagrams of a variant of the XOR cellular automaton with (Tx), = 
+x,(mod 2) by exchanging the role of time and space. 

^ A cellular automaton may in fact be defined as a map on a symbolic configuration space 5* *”^ 
that is continuous and invariant under translations. These are precisely the maps defined homoge¬ 
neously using local update rules CD. 

* The balance property is shared among all cellular automaton maps that are onto CD. 

® More specifically, T has an inverse given by (x,y)). = (y,- +x,.+i ,x,). Setting R{x,y) = (y,x), 
we can write the inverse map as R^^TR. 
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2.3 A bi-permutative cellular automaton 


As a third example, let us look at the cellular automaton with symbol set S = 
{0,1,2}, defined by 


{Tx)i = (p{xi_\,Xi,Xi+\) ^ 


Xi-i+Xi+\ + \ (mods) 
Xi-\ +x,+i (mod 3) 


if Xi = 2, 
otherwise. 


( 1 ) 


Unlike the last two examples, the map T is not additive. Nevertheless, the local 
rule (p is bi-permutative, which is to say both a i—> (p{a,b,c) and c i-A (p{a,b,c) are 
permutationsIt follows, similarly as in the case of the XOR cellular automaton, 
that the map T is 9-to-l. Like the last two examples, the uniform Bernoulli measure 
(i.e., the distribution of a configuration chosen at random by flipping an unbiased “3- 
sided coin’j^independently for each site) is invariant under T. The bi-permutativity 
also ensures other statistical regularities for T, similar to those enjoyed by the XOR 
cellular automaton ll^ . 

It is not known whether the above cellular automaton has a randomizing be¬ 
haviour in the sense of Theorems or Nevertheless, there is experimental ev¬ 
idence suggesting that T indeed randomizes biased Bernoulli configurations (Fig¬ 
ure]^. The graphs in Figure |^depict the change in time of the empirical entropies of 
words of length 1, 3 and 7 in consecutive configurations of this cellular automaton, 
starting from a biased Bernoulli configuration. More specifically, a single pseudo¬ 
random configuration x is picked by simulating independent biased (3-sided) coin 
flips, and iterations of T on x are obtained for up to 50 time steps[^At each time 
step, the empirical entropy of words of length k (for k = 1,3,7) appearing in the 
current configuration are calculated as follows. For each word w of length k with 
symbols from S, let (y) denote the frequency of appearance of the word w in con¬ 
figuration y. The empirical entropy of words of length k appearing in y is defined 
as 


Hk{y) = - L Cw(3')logCw(y) • 

Figure 1^ shows that the empirical entropy Hk{T'x) rapidly increases to reach its 
maximum at around klog3, where it stays. 

The empirical entropy Hk{y) is a measure of disorder in y. It is maximized when 
all words of length k appear in y with approximately the same frequency. For in¬ 
stance, a configuration chosen using independent unbiased coin flips (which is con¬ 
sidered to be maximally disordered) has, with probability 1, the maximum empirical 
entropy Hk for each k. The empirical entropy Hir{y) should be compared with Boltz- 


Notice that the XOR cellular automaton is also bi-permutative. 

** or rolling a 3-sided die, if you wish. 

Instead of infinite configurations, configurations of symbols on a large ring (indexed by for 
N large) are used. 
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Fig. 4 Evidence of randomization in the bi-permutative cellular automaton defined in Equa¬ 
tion The starting configuration x (on a ring of length 50000) is chosen with independent flips 
of a biased 3-sided coin with distribution (0 0.95,1 0.025,2 ^ 0.025). 


mann’s entropy (see below){^ Although not exhaustive, the simulation in Figure]^ 
suggests a gradual approach towards a maximally disordered state. 


2.4 Rule 30 

Yet another example where randomization seems to be present is the so-called 
Rule 30 cellular automaton. The Rule 30 cellular automaton has the binary alphabet 
{0,1} as the symbol set and may be defined by the logical expression 

{Tx)i = (p{xi^i,Xi,Xi+i) =Xi^i XOR(r;ORr;+i), 

where XOR denotes “exclusive or”. The local rule (p is not bi-permutative, but it is 
left-pennutative (i.e., a o (p(a, c) is a bijection for each b and c). This still implies 
that the map T is onto, and that each configuration has at most 4 pre-images under 
T. It follows that the Rule 30 cellular automaton again has the balance property. 
Starting from an unbiased Bernoulli configuration, the Rule 30 cellular automaton 
enjoys similar statistical regularities as in the previous examples, along almost all 
space-time directions 

This cellular automaton was first studied by Wolfram 13^ . He noticed that even 
with a simple starting configuration, the iterations of the Rule 30 cellular automaton 
produce complex seemingly unpredictable patterns. He proposed a method for gen- 

Eor the interpretations of entropy, see e.g. OEo). 

More specifically, T tr* is an exact endomorphism unless k — — \. 
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eration of pseudo-random sequences by initializing the Rule 30 cellular automaton 
with a “seed” and picking the symbols appearing on a particular site every few time 
steps, which he tested against standard statistical randomness testsp^ 

Figure shows evidence for randomization in the Rule 30 cellular automaton 
starting from biased Bernoulli conhgurations. The empirical entropies are calculated 
as in the previous example. 



Fig. 5 Evidence of randomization in the Rule 30 cellular automaton. The starting configuration 
X (on a ring of length 50000) is chosen with independent flips of a biased coin with probability 
p = 0.05 of having a 1. 


2.5 Q2R spin dynamics 

One feature that is common among the hrst three examples (and is suspected for 
the Rule 30 cellular automaton) is the absence of conserved energy-like quanti¬ 
ties Q. A non-trivial conserved quantity would partition the macroscopic states 
into unescapable hbers, hence preventing complete randomization. Nevertheless, 
one might still expect randomization within each fiber. 

The next example is based on the conhgurations of the Ising model, and was in¬ 
troduced by Vichniac ll^ . The Ising model is a model of ferromagnetism; each site 
of an inhnite two-dimensional square lattice (indexed by I?) carries a symbol f or 
representing two possible directions of a magnetic spin. The interaction between 
spins is modelled by assigning energy —1 or -fl to any pair of neighbouring sites 
whose symbols are, respectively, aligned or anti-aligned spins. The energy content 

Rule 30 is in fact used in Mathematica as one of the methods for pseudo-random number gener¬ 
ation. 
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of a region is the sum of the interaction energy of neighbouring sites in that region. 
Hence, lower energy in a region corresponds to an average tendency of neighbouring 
spins to be aligned. 

The dynamics is through alternate applications of two maps Tq and Ti: the first 
map updates the even sites (i.e., the sites (/, j) with i+j even) and the second updates 
the odd sites. The updating is done in such a way that the energy is preserved; a spin 
is flipped if and only if the flipping does not change the total energy of the site 
and its four immediate neighbours. More specifically, let us say that a site {i,j) is 
balanced on a configuration x if half of the neighbouring spins x,+i j, Xij+i, Xi-ij 
and Xij-i are upward and the other half are downward. For a spin a G let d 

denote the spin with the opposite direction as a. Then, 



^ j Xij if i + j even and {i,j) balanced on x, 
I Xij otherwise. 


and similarly for 7). The dynamical system defined by the composition T = T\Tq is 
called the Q2R cellular automatonp^ 

The Q2R system is reversible and symmetric under time reversal in the sense 
that = TqTi. By construction, it also conserves the energy. The conservation of 
energy can be formulated in various equivalent ways. For us, it suffices to say that T 
(indeed, each of To and 7)) keeps the average energy per site invariant. Note that the 
average energy per site of a configuration x is a function of its macroscopic state 
The set of translation-invariant probability measures with a given average energy 
per site is convex and closed under the topology of weak convergence. Therefore, 
any limit or Cesaro limit of the measures T^Ttx will have the same average energy 
per site as Ttx- 

As before, we consider the uniform Bernoulli measure on the configuration space 
to be a representation of a “maximally disordered” state, because it assigns 
the same probability 2^1'^! to all cylinder sets 


[qA] = {x e ■ Xi = qi for IgA} . 


Put another way, in a typical spin configuration obtained by independent unbiased 
coin flips, (translations of) each finite pattern q^-.A^ {tj'l-} appears with the same 
frequency 2^1'^L Another way to express this is to say that the entropy 


Ha{I^) = - Y. M(b/i])logAi(teA]) 




of any finite window AG I? has its maximum value |A|log2 if p is the uniform 
Bernoulli measure. 


Strictly speaking, this is not a cellular automaton with the common definition of the term, be¬ 
cause the even and odd sites are treated differently. It can however be recoded into a standard 
cellular automaton. 
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The description of a “maximally disordered” state with a given average energy 
per site is more subtle. Indeed, since the constraint is not local, it might not be 
possible to maximize the entropy simultaneously for all finite windows A. 

However, if B D A, a larger value for Hgipi) is a better indication of disorder than 
a larger value for Ha{ij,). Therefore, one may measure the disorder by the limit 
entropy per site 


h{pL) = lim 


141 


where In = [—n,n\ x [— n,n][^A maximally disordered state with a given average 
energy per site may therefore be identified with an ergodic translation-invariant mea¬ 
sure that has the prescribed average energy per site and maximal entropy per site 
subject to the energy constraint. These are the ergodic Gibbs measures for the Ising 
model (see e.g. mm)^ 

Figure 1^ shows few snapshots from a simulation of the Q2R cellular automaton 
starting from a biased Bernoulli configuration. At the beginning, the spins gradually 
cluster, even though the total length of the boundaries between upward and down¬ 
ward clusters remains constant. After a while, the macroscopic picture of the con¬ 
figuration appears to reach an equilibrium, which resembles a typical configuration 
chosen according to a Gibbs measure of the Ising modelj^ More elaborate simu¬ 
lations have shown numerical agreement with the Ising model (see e.g. inziiia), 
hence supporting the conjecture that Q2R indeed randomizes a coin-flip configura¬ 
tions within the corresponding average energy per site level. 


In the next two sections, we attempt to make the concepts of macroscopic state 
and maximally disordered state more precise. 


3 Macroscopic States 

Let us fix a symbol set S and denote by = {x : —>■ 5} the set of all d- 
dimensional configurations of symbols from S. A configuration x€ ^ is considered 
as a microscopic state of a system. The macroscopic state of x consists of all infor¬ 
mation in X that is visible through “macroscopic observations”. Which observations 
are considered macroscopic is somewhat arbitrary and depends on the physical con¬ 
text. Here, we equate “macroscopic” with “statistical”: a macroscopic observation 


For translation-invariant measures, the limit exists and is equal to inf„ (p)- 

** In the standard Ising model, the ergodic Gibbs measures are considered to be suitable descrip¬ 
tions of the macroscopic states of equilibrium for the ferromagnetic material (see e.g. ©). 

Again, the simulation is made on a torus rather than the infinite lattice W?. The “equilibrium” 
configurations could be compared with a random configuration generated by a Gibbs sampler for 
the Ising model. 
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t=0 


t=500 


t=5000 


Fig. 6 Simulation of the Q2R cellular automaton starting from a coin-flip configuration with prob¬ 
ability p = 0.1 of having f (represented by black). After a relatively short while, the macroscopic 
look of the configuration seems to reach an equilibrium with upward and downward spins clustered 
together. 


would amount to identifying the frequency of a fixed pattern, or the spatial average 
of a “microscopic observation”. 

To be more specific, let us call a function / ; (2” —K a local observable if the 
symbols Xi at finitely many sites t G A are sufficient to determine f{x). For instance, 
if q': A —> 5 is a pattern on a finite set A, the function x i-A that has value 1 if 
X agrees with ^ on A and 0 otherwise is a local observable. Furthermore, any local 
observable is a linear combination of observables of this type. 

If / is a local observable, the spatial average 



( 2 ) 


will be considered as a macroscopic observable. As before, /„ — [—n,nY, and & 
denotes the translation by i. For instance, l^q(x) is simply the frequency of the oc¬ 
currence of (translations of) the pattern q in x. The limit in Q may or may not exist. 
If well-defined, the collection (/(.r) : / local) defines a unique translation-invariant 
probability measure Kx with 



describing the statistics of x. In particular, TlxiWi) — CqY) for any finite pattern q. 

Every translation-invariant probability measure on ^ arises from a configuration 
in the above fashion ESi . Nevertheless, not every translation-invariant probability 
measure should be considered as an unambiguous macroscopic state. To illustrate 
this, consider a one-dimensional configuration z with z, = 0 for i < 0 and z, = 1 
for i > 0. Then 7t^ — -f 5i), where do and 5i are the point-mass measures at 
the uniform configurations 0 and 1. The measure 5(5p+ 5i) however suggests an 
ambiguous situation in which we are uncertain about which of 0 and 1 is the real 
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configuration of the systemj^The ambiguity comes from the fact that the configura¬ 
tion z lacks homogeneity: its left and right tails have different macroscopic looks 

on configurations that are homogeneous. We call a configurationx 

i) Ttx is well-defined (i.e., the spatial average /(x) of every local observable / 
exists on x), 

ii) Ttx cannot be written as a non-trivial convex combination of other translation- 
invariant measures (i.e., Kx is ergodic for the group of translations)^] and 

iii) X is a point of density for Ttx, which is to say, every finite pattern occurring in x 
occurs with positive frequency. 

The measure Ttx describing the statistical averages of a homogeneous configuration 
X will be called the macroscopic state of x. The set of homogeneous configurations 
sharing the same ergodic measure n as macroscopic state is called the ergodic set 
of n. The countability of the set of finite patterns together with the ergodic theorem 
implies that the ergodic set of any ergodic measure n has measure 1 with respect to 
n (see llZ7ll )P^Hence, one may think of a configuration in the ergodic set of Tt as a 
typical configuration with macroscopic state Trrj 


Here, we focus 
homogeneou^^if 


4 Maximally Disordered States 


From the definition, it follows that, for any finite window A C Z'^, the entropy Ha{'k) 
of a macroscopic state (i.e., a translation-ergodic measure) n agrees with the empir¬ 
ical entropy Ha (x) of any configuration x that is typical for n. The entropy Ha (tt) is 
a convex continuous function of Tt, taking its maximum value |A| log |5| only when 
TT assigns equal probabilities to every cylinder with base A. 

See (2), Paragraph (7.8), for a similar reasoning. 

As an example in which inhomogeneity does not arise from left-right asymmetry, let mo = 0 and 

= 2(1 -|-2^ H- 'i-tf), and construct a one-dimensional configuration z : Z —> {0,1} with Zi = 0 

if Mk < |i| <mk + {k+\f and Zi = 1 if -f (^+ 1)^ < |i| < m^+i. Then, again 71-= ^(So-f 5j^). 

Such points are called regular in 1221. 

It might not be intuitively clear why Ttx should be required to be ergodic in order for x to be 
called homogeneous. A perhaps more plausible condition equivalent to the ergodicity of Ttx is that 
for every local observable / and each e > 0, the upper density of the set 


a 6 ; 




> £ 


in goes to 0 as m ^ (271. Note that for both examples of inhomogeneous configurations 

mentioned above, this condition fails for the function f = with f{x) = 1 if xq = 1 and /(x) = 0 
otherwise. 

In particular, the set of homogeneous configurations has measure 1 with respect to any 
translation-invariant probability measure. 

If need be, stronger notions of homogeneity and typicalness can be obtained by intersecting the 
ergodic set of 7t with other sets of measure 1. 
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The limit entropy per site h{n) is affine (hence convex) and takes its maximum 
value log |5| precisely when n is the uniform Bernoulli measure, that is the state with 
“maximum disorder”. The map n h{n) is however not continuous. For example, 
for each m > 0, let x{m) be a periodic conhguration in {0,1}* that has each word 
of length m exactly once in its periodThen, the macroscopic states have 0 
entropy per site yet converge weakly to the uniform Bernoulli measure as m —>■ 

In fact, every macroscopic state is a weak limit of macroscopic states of periodic 
conhgurations (which all have entropy 0). Nevertheless, entropy per site is upper 
semi-continuous: n„ —^ n implies limsup,,^„„/i(7r„) < h{7z) (see e.g. llJTl '). 

Let us now consider a concept of energy as in the Ising model. Namely, let / : 
^ K be a local observable, representing the energy contribution of the symbol 
at the origin when interacting with the nearby symbols. For instance, for the Ising 
model, we can set 


fix) 


5 (niidox) - n^idox)) if xq = f, 

j {n^{dox) - n|(<9ox)) if xq = }., 


where n|((9ox) and ni{dox) are the numbers of upward and downward spins among 
the four neighbours xi.o, xo.i, x_i,o andxo._i of site 0. Then, /(x) represents the av¬ 
erage energy per site of a conhguration x, which is well-dehned if x is homogeneous, 
and agrees with 7L(/)- 

Suppose e is a real number within the range of /. Among all the macroscopic 
states 7Z satisfying 7t{f) = e, those with maximum entropy per site h{n) could be 
considered as the most disordered. These are the presumed equilibrium states of a 
system in which the energy / is conserved]^ Applying the Lagrange multipliers 
method (Legendre transform), the optimization problem 


maximize h{n) 
subject to 7t{f) = e 


(for e in the range of /) translates into the unconstrained problem 

maximize h{n)-Pn{f) (3) 

(for p G M). The compactness of the space, the continuity of Tt 7t{f) and the 
upper semi-continuity of n i-G h{n) imply that, in both problems, the maximums are 
achieved by some translation-invariant probability measuresj^Dobrushin, Lanford 
and Ruelle proved that the macroscopic states solving the variational problem ([^ are 


Such a configuration corresponds to an Eulerian circuit on the de Bmijn graph of order m. 

A similar discussion applies if rather than single notion of energy, we have a finite number of 
conserved quantities /i, /2 . 

However, the maximum in the first problem is not necessarily achieved by ergodic measures (i.e., 
macroscopic states). Such a situation corresponds to a first-order phase transition (see I31II14I ). 
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precisely the ergodic Gibbs measures at inverse temperature jSp^See EUmEol 
for more information. 


5 Boltzmann’s Theory and Cellular Automata 

Let us take a moment to draw parallels between the concepts in Boltzmann’s gas 
theory and in cellular automata. We refer to the survey article of the Ehrenfests iH 
and the book by Kac ini, which contain excellent accounts of Boltzmann’s theory 
and related issues. See also ll23l for a general discussion. 

Boltzmann considered an isolated collection of n particles (identical hard spheres) 
interacting via elastic collisionsj^The particles are assumed to be homogeneously 
distributed in (a bounded but large region of) the space. To be concrete, we may 
consider a cubic region with periodic boundary conditions. The focus is thus only 
on the velocity of the particles. Assuming that the number of particles is very large, 
we take p{v,t)dv to be the fraction of particles that, at time f, have velocities within 
an inhnitesimal approximation dv of each value v G K^. Using the assumption of 
spatial homogeneity, Boltzmann estimated the average number of collisions, in an 
inhnitesimally small time interval {t,t + dt), among particles with velocities close to 
u and those with velocities close to v (the Stosszahlansatz)^^The model of elastic 
collisions (the conservation of energy and momentum) could now be invoked to ob¬ 
tain the new distribution p(y,f + ck)dv for the velocity of the particles at the end of 
the time interval {t,t +dt). This leads to an equation describing the time evolution 
of p{v,t) known as the Boltzmann equation. Boltzmann used this equation to show 
that the quantity — f p (v,t) logp (v,t)dv is monotonically increasing in time, except 
at an equilibrium in which the velocities are distributed according to the Maxwell 
distribution p(v) ~ . 

Boltzmann’s derivation of the “law of increase in entropy” faced two major 
criticisms|^Loschmidt objected that a system governed by a reversible and time- 
reversal symmetric dynamics (like a system of particles interacting via elastic colli¬ 
sions) cannot possibly have an observable that is invariant under time reversal (like 
entropy) and is monotonically increasing in all situations. Zermelo’s objection was 
based on Poincare’s recurrence theorem. According to Liouville’s theorem, a Hamil¬ 
tonian system (such as a system of particles) preserves the phase space volume (i.e., 
the Lebesgue measure). Poincare’s theorem states that in a volume-preserving sys¬ 
tem whose phase-space has hnite volume (such as an isolated system of particles in 
a 3-dimensional torus), almost every trajectory eventually returns (inhnitely many 
times) arbitrarily close to its starting point. This again implies that such a system 


In the current setting, Gibbs measures coincide with full-support Markov measures. 

See Chapter I of (4l and Sections 111.1-2 of HU. 

More specifically, the Stosszahlansatz says that the frequencies of particles with different veloc¬ 
ities that enter an infinitesimally small region at any given time are statistically independent. 

See Section 7 of (4) and Section III.7 of E). 
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cannot have a continuous observable that is monotonically increasing in time for 
almost all starting points. 

In order to address these criticisms, Boltzmann later introduced another more re¬ 
fined frameworkp^In this new setting, each particle i is described by a state variable 
Xi, which could, for instance, consist of the position as well as the velocity of the 
particle. The phase space of an individual particle (i.e., the range of values of Xi) is 
divided into small equally-sized regions Ai,A 2 ,_Given a configuration of parti¬ 

cles X, we can form the fraction = nj^/n of particles whose states are in region A^.. 
Conversely, given the macroscopic information p = (pi ,p 2 ,...), there corresponds 
a set [p] consisting of all particle configurations that have fractions pi, P 2 ,... of par¬ 
ticles in regions Ai, A 2 ,.... If the number of particles is very large, the volume of the 
set [p] could be measured by the quantity H{p ) — — Lit Pit log Pi- Neglecting any in¬ 
teraction energy between particles, the energy of a configuration x could be written 
as E{x) = nY^kPkCk, where Ck is the (approximate) energy of a particle whose state 
is within Ak- Boltzmann now argued that a system with energy E at equilibrium is 
most likely to be found (at almost any moment of time) to have a state distribution 
p that maximizes the quantity H{p) among all the state distributions with energy 
E, for this is the distribution for which [p] takes the overwhelmingly largest portion 
of the set of all configurations with energy E. If n is large, this equilibrium distribu¬ 
tion is (approximately) given by Pk where j3 is a Lagrange multiplier for 

tuning E. 

The analogy with cellular automata should be clear. Rather than particles, the 
elementary pieces of information in cellular automata are carried by lattice sites 
representing discretized positions in the space. The symbol at site i should therefore 
be compared with the state of particle i. The model of elastic collisions governing 
the interaction between the particles is replaced with the local update rule describing 
the cellular automaton map T :S‘^ —> 5^ . The fraction Q,{x) of sites having symbol 
a is an elementary macroscopic observable analogous to the fraction nkjn of parti¬ 
cles whose states are in region A*., but now it is clear that one must also take into 
account the macroscopic observables ^q{x) (for larger patterns q-.A—i^S) which con¬ 
tain information about correlations between finite collections of sites. Boltzmann’s 
entropy corresponds to the empirical entropy Ho{x) of symbols appearing in the 
configuration x, or more generally, the empirical entropy Ha{x), which measures 
lack of bias in the frequency of the patterns with support A occurring in x. 

To understand Boltzmann’s argument about the increase in entropy, consider the 
XOR cellular automaton (Section [2T| l, and let .r be a Bernoulli random configuration 
with parameter p G (0,1) (i.e., a homogeneous configuration whose macroscopic 
state is described by the Bernoulli measure with parameter p). In particular, the 
words of length 2 have frequencies 

Coo(^) = (1-p)^ , Coi(-«) = (1-f)/?, Cio(jc) =/?(!-/?) , Cii(-*)=/ 


See Chapter II of (4l and Section III.8 of Gil 



Equilibrium in deterministic CA 


17 


in X. It follows that the frequency of occurrence of symbol 1 after one step 

is (p{p) = 2p{\ — p). If H(p) = —plogp— (1 —p)log(l — p) denotes the binary 
entropy function, one can easily verify that H(^(p)) > H(p) with equality if and 
only if p = i. Therefore, it is indeed the case that the entropy Hq{Tx) is larger than 


Hq{x) unless p — 


Boltzmann’s assumption about the number of collisions (the Stosszahlamatz) is 
analogous to the (invalid) assumption that the conhguration Tx 'k also Bernoulli, 
so that the frequency of occurrence of a word w = a\a 2 - ■ ■ a„ in Tx is simply the 
product of the frequencies of ai, 02 , •••,««■ If true, this would lead to the conclusion 
that the entropy increases monotonically in the consecutive steps, that is, Hq{x) < 
Ho{Tx) < HojT^x) < ■ ■ ■ with the equalities only if Hq{x) = log2. The assumption 
is of course falsely Nevertheless, the randomization property of the XOR cellular 
automaton (Theorem [T]) suggests a mathematically rigorous scenario that makes 
Boltzmann’s conclusion essentially true. Indeed, the randomization implies that for 
any hnite set A C Z, the entropy Ha(T*x) approaches (after ignoring a negligible 
set of time steps) to its maximum value |A| log2, even if this convergence might be 
non-monotonic 0 

For cellular automata, the uniform Bernoulli measure plays the role of the 
Lebesgue measure on the phase space of a Hamiltonian system. The analog of Li- 
ouville’s theorem is the balance property, which says that any surjective cellular 
automaton map T : preserves the uniform Bernoulli measure. Hence, 

Poincare’s theorem applies to all surjective cellular automata. It says that start¬ 
ing from almost every configuration x (i.e., any configuration in a set of uniform 
Bernoulli measure 1), every finite pattern occurring on x (i.e., q=XA for some finite 
set A C Z‘^) reappears on the same position inhnitely many times (i.e., {T'x)a = q for 
inhnitely many time steps f)j^Note that this does not say anything about a starting 
conhguration that is not typical for the uniform Bernoulli measure. 

It is worth mentioning that surjective cellular automata preserve the limit en¬ 
tropy per site, that is, h{T*n) = h{n) for every macroscopic state n and f = 1,2,... 
(see e.g. GH). On the other hand, randomization implies a jump at the limit 
to h{p.) > lim,gy h(T'n) = h{n). This may be understood as follows A sub- 

t — 

maximum value for HAiji) expresses the presence of correlations among symbols 
with relative positions given by A. The convergence of the entropy Ha{T^7i) to its 
maximal value |A|log|5| for larger and larger hnite sets A C indicates that the 


Similar (but more cumbersome) calculations lead to the same conclusion for the examples in 
Sections [2.3| and [2A| More generally, one can show that if/:5x5—)-Sisa function that is permu- 
tative on both its arguments, X and Y are independent S-valued random variables with distributions 
p and p', and Z = f(X,Y), then H{Z) > H{X) with equality if and only if p is uniform. 

For the XOR cellular automaton, Tx is Bernoulli only if p = j. 

Boltzmann’s derivation for a system of particles was later made rigorous in a certain asymptotic 
regime (the Boltzmann-Grad limit) by the ground-breaking work of Lanford I22I and others | 2 j. 

If the map T is ergodic (e.g., if T is the XOR map or XOR-transpose map), the average time 
between two consecutive reappearances is 21'’^! by Kac’s recurrence theorem. 

For simplicity, we assume that T has no non-trivial conserved quantity. 
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correlations are gradually distributed over larger and larger regions, and are escaping 
to infinity as f grows to infinity. 


6 How Far the Second Law Goes? 

The phenomenon described by the second law of thermodynamics extends to all 
physical systems. What constitutes a physical system and what is the exact state¬ 
ment of the second law are much less clear. Let us discuss few prerequisites for the 
presence of the randomization effect. For simplicity, we focus on the case that the 
cellular automaton has no non-trivial conserved quantity. 

To fix the terminology, let us say that a cellular automaton r : 5^ S'^ random¬ 

izes a translation-invariant probability measure n if the Cesaro averages ^ 
converge weakly to the uniform Bernoulli measure ji as n goes to infinity. Equiva¬ 
lently, T randomizes n\fT*n—^ fi along a subsequence 7 C N of density 1 of time 
steps (see ifTbl . Corollary 1.4). 

An obvious case in which randomization fails is when T is not surjective. Lack 
of surjectivity (or reversibility) has been suggested as a mechanism behind the con¬ 
trasting phenomenon of self-organization (see e.g. 1381). The requirement for T to 
be surjective is a relaxation compared to reversibility (let alone time-reversal sym¬ 
metry), which is common among most microscopic physical theories. Yet, surjectiv¬ 
ity already guarantees the invariance of the uniform Bernoulli measure. Moreover, 
surjective cellular automata are in some way close to being injective: if two config¬ 
urations x,y differ on at most finitely many positions, then Tx and Ty are distinct 
(see e.g. iiSl)[^ 

Besides surjectivity, the cellular automaton requires to have certain degree of 
chaos in order for randomization to occur. For instance, for a one-dimensional sur¬ 
jective cellular automata with equicontinuity pointsj^the Cesaro averages ^ 
with a Bernoulli starting measure n converge but not necessarily to the uniform 
Bernoulli measure m. Such cellular automaton typically have too many distinct 
conserved quantities, resulting in failure of any thermodynamic behaviour (see ll34l 

m). 

Another obstacle for randomization is too much regularity in the starting con¬ 
figuration. The simplest type of regularity is periodicity. Note that a spatially pe¬ 
riodic configuration is also temporally periodic]^ Therefore, no cellular automa¬ 
ton can randomize a periodic configuration. A spatially periodic configuration has 


A non-surjective cellular automaton may still act surjectively on a natural subspace (e.g., a 
mixing subshift of finite type). Randomization within such a subspace may still occur. 

A cellular automaton T is equicontinuous (or stable) at a configuration x if for each finite set A, 
there is a finite set B such that for any configuration y that agrees with x on B, T’x and T'y agree 
on A for every f > 0. A cellular automaton with equicontinuity points is not sensitive, hence not 
chaotic (see ED)- 

A configuration x is said to be spatially periodic if its translation orbit is finite, or equivalently, 
if there are d linearly independent elements qi,q2, ■ ■ ■ ,dd S such that Xa+mn = Xa for all 
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zero entropy per site. As a more sophisticated example of a regularity obstructing 
randomization, consider the XOR cellular automaton. The XOR cellular automa¬ 
ton has the following self-similarity property, which can be verified by induction; 
(r^ x)i = Xi +Ji:,+ 2 " (mod 2) for every n > 0. Define the duplicate of a configuration 
X as the configuration Dx with {Dx) 2 i = (Z)x) 2 ,+i = x,-. It follows from self-similarity 
that DTx = T^Dx, or more generally D"Tx = T^"d"x. For the uniform Bernoulli 
measure p, in particular, we find that D"p = D"/r. Note that if ;r is a translation- 
ergodic measure, the measure Dn = j {Dk -f oDn) is also translation-ergodic and 

has entropy per site jh{n). Moreover, if T^7t — n, we get — Dn. Therefore, 

— —2 —3 

we have an infinite sequence Dp,D p,D /r,... of distinct translation-ergodic mea¬ 
sures (i.e., macroscopic states) with positive entropy that are not randomized by the 
XOR cellular automaton^ 

In summary, randomization is expected only if the cellular automaton is surjec¬ 
tive and “sufficiently chaotic”, and the starting configuration does not have “too 
much regularity”. 
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